-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[compiler-rt] Fix frame numbering for unparsable frames. #148278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[compiler-rt] Fix frame numbering for unparsable frames. #148278
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-compiler-rt-sanitizer Author: Jesse Schwartzentruber (jschwartzentruber) ChangesThis can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack. An example spidermonkey trace:
Without this change, the following symbolization is output:
The last frame has a duplicate number. With this change the numbering is correct:
Full diff: https://github.com/llvm/llvm-project/pull/148278.diff 1 Files Affected:
diff --git a/compiler-rt/lib/asan/scripts/asan_symbolize.py b/compiler-rt/lib/asan/scripts/asan_symbolize.py
index 058a1614b55e6..e70f987f03fe6 100755
--- a/compiler-rt/lib/asan/scripts/asan_symbolize.py
+++ b/compiler-rt/lib/asan/scripts/asan_symbolize.py
@@ -22,7 +22,6 @@
import argparse
import bisect
import errno
-import getopt
import logging
import os
import re
@@ -38,6 +37,7 @@
allow_system_symbolizer = True
force_system_symbolizer = False
+
# FIXME: merge the code that calls fix_filename().
def fix_filename(file_name):
if fix_filename_patterns:
@@ -507,20 +507,29 @@ def symbolize_address(self, addr, binary, offset, arch):
assert result
return result
- def get_symbolized_lines(self, symbolized_lines, inc_frame_counter=True):
+ def get_symbolized_lines(self, symbolized_lines):
if not symbolized_lines:
- if inc_frame_counter:
- self.frame_no += 1
- return [self.current_line]
- else:
- assert inc_frame_counter
- result = []
- for symbolized_frame in symbolized_lines:
- result.append(
- " #%s %s" % (str(self.frame_no), symbolized_frame.rstrip())
+ # If it is an unparsable frame, but contains a frame counter and address
+ # replace the frame counter so the stack is still consistent.
+ unknown_stack_frame_format = r"^( *#([0-9]+) +)(0x[0-9a-f]+) +.*"
+ match = re.match(unknown_stack_frame_format, self.current_line)
+ if match:
+ rewritten_line = (
+ self.current_line[: match.start(2)]
+ + str(self.frame_no)
+ + self.current_line[match.end(2) :]
)
self.frame_no += 1
- return result
+ return [rewritten_line]
+ # Not a frame line so don't increment the frame counter.
+ return [self.current_line]
+ result = []
+ for symbolized_frame in symbolized_lines:
+ result.append(
+ " #%s %s" % (str(self.frame_no), symbolized_frame.rstrip())
+ )
+ self.frame_no += 1
+ return result
def process_logfile(self):
self.frame_no = 0
@@ -546,8 +555,7 @@ def process_line_posix(self, line):
match = re.match(stack_trace_line_format, line)
if not match:
logging.debug('Line "{}" does not match regex'.format(line))
- # Not a frame line so don't increment the frame counter.
- return self.get_symbolized_lines(None, inc_frame_counter=False)
+ return self.get_symbolized_lines(None)
logging.debug(line)
_, frameno_str, addr, binary, offset = match.groups()
@@ -603,6 +611,7 @@ def _load_plugin_from_file_impl_py_gt_2(self, file_path, globals_space):
def load_plugin_from_file(self, file_path):
logging.info('Loading plugins from "{}"'.format(file_path))
globals_space = dict(globals())
+
# Provide function to register plugins
def register_plugin(plugin):
logging.info("Registering plugin %s", plugin.get_name())
@@ -779,9 +788,13 @@ def __str__(self):
arch=self.arch,
start_addr=self.start_addr,
end_addr=self.end_addr,
- module_path=self.module_path
- if self.module_path == self.module_path_for_symbolization
- else "{} ({})".format(self.module_path_for_symbolization, self.module_path),
+ module_path=(
+ self.module_path
+ if self.module_path == self.module_path_for_symbolization
+ else "{} ({})".format(
+ self.module_path_for_symbolization, self.module_path
+ )
+ ),
uuid=self.uuid,
)
|
This can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack. see: llvm/llvm-project#148278
This can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack. see: llvm/llvm-project#148278
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an OSS contributor, I don't have commit access, but wanted to leave some suggestions in case they help. Hope they're helpful!
module_path=( | ||
self.module_path | ||
if self.module_path == self.module_path_for_symbolization | ||
else "{} ({})".format( | ||
self.module_path_for_symbolization, self.module_path | ||
) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking - there's no change here, just formatting, right?
Was this intended or applied by some kind of auto-formatter? Just wondering if we should minimize the diff by reverting this.
As an aside - this assignment to module_path
is moderately complex. I wonder if we could make it simpler by expanding the ternary expression into a proper if-else
block. But that's probably out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the result of running black
on the file, since I'm touching the file. I can revert if minimal diff is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say authoritatively one way or another, but I'd imagine a minimal diff will make it easier to approve.
@@ -38,6 +37,7 @@ | |||
allow_system_symbolizer = True | |||
force_system_symbolizer = False | |||
|
|||
|
|||
# FIXME: merge the code that calls fix_filename(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this FIXME
still accurate? Otherwise, I'd expect this change to have no effect as it is no called as per the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand your comment. A newline was added by black
but otherwise this FIXME
is unrelated to the PR, and I didn't touch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't super clear. I realize that the PR did not add the FIXME
:-) .
I was asking if you think the FIXME
is no longer accurate because the FIXME
implies that fix_filename
is not being called (specifically, that the code that invokes it is not merged yet). However, if that were true, then your code would have no effect, so I think the FIXME
is stale or wrong.
So I was wondering if you had context on it, in case we can remove the FIXME
. That's all! Obviously, somewhat out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sorry. I don't know anything about the purpose of the function or the FIXME
.
At the time the FIXME
was added, the file already contained calls to fix_filename
from the symbolize
methods of a few Symbolizer
subclasses. My guess is that the author hoped these callers could be merged into super().symbolize()
instead of using a helper function. The implementations have gotten more complex since then, so I doubt the comment is still relevant.
@davidmrdavid FYI since you've merged three pull requests (#125924, #122990, #117929) you're eligible for commit access if you wish: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access |
Thanks @thurstond, that would be wonderful! I'll look to follow those instructions later :) Thanks! |
This can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack.
c3e8617
to
7aa8d35
Compare
This can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack. An example spidermonkey trace:
Without this change, the following symbolization is output:
The last frame has a duplicate number. With this change the numbering is correct: